You can work with Stata
in a Python
notebook by using the package ipystata
. Just like r2py
, which allows us to use R
in Python
, we can now use both (or if you want all three!) programming languages in one notebook.
Let's start by importing all the packages we want to use.
In [1]:
import numpy as np
import pandas as pd
import ipystata
%pylab --no-import-all
%matplotlib inline
In [2]:
%%stata?
Let's run some commands in Stata
from this notebook. Let's run the same code as in the Stata Notebook Example. To do so, we will use the %%stata
magic.
In [3]:
%%stata
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
Notice that it returned everything except the graph. To be able to get the graph we need to provide the option -s graph_session
to the %%stata
magic.
In [4]:
%%stata -gr
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
Out[4]:
Looks like there are issues preventing Stata
to pass the figure back to Jupyter
. Nonetheless, we can save it in Stata
and open it here.
In [5]:
%%stata -gr
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
graph export "./graphs/price-mpg.png", replace
Out[5]:
Let's import the figure to our notebook.
As we have seen Python
is very powerful for data munging and cleaning. Also, we have seen that figures may look much nicer. But, since we already know Stata
for econometric analyses, let's use both languages to get the best of each. We can do this by passing additional options to %%stata
. First, let's get the data from auto.dta
from Stata
as a pandas
dataframe.
In [6]:
%%stata -o car_df
sysuse auto.dta
In [7]:
car_df
Out[7]:
In [8]:
# Import matplotlib
import matplotlib as mpl
# Import seaborn
import seaborn as sns
sns.set()
# paths
pathgraphs = './graphs/'
In [9]:
# Define our function to plot
def ScatterPlot(dfin, var0='mpg', var1='price', labelvar='make',
dx=0.006125, dy=0.006125,
xlabel='Miles per Gallon',
ylabel='Price',
linelabel='Price',
filename='price-mpg.pdf'):
'''
Plot the association between var0 and var in dataframe using labelvar for labels.
'''
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
df = dfin.copy()
df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
# Plot
k = 0
fig, ax = plt.subplots()
sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
movex = df[var0].mean() * dx
movey = df[var1].mean() * dy
for line in range(0,df.shape[0]):
ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=14, color='black')
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
plt.xlim([df[var0].min()-1, df[var0].max()+1])
plt.ylim([0, df[var1].max()+1000])
ax.tick_params(axis = 'both', which = 'major', labelsize=16)
ax.tick_params(axis = 'both', which = 'minor', labelsize=8)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend()
plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
pass
In [10]:
ScatterPlot(car_df)
In [11]:
car_df['mpg_sq'] = car_df.mpg ** 2
In [12]:
%%stata -d car_df
reg price mpg mpg_sq rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
If you want to perform additional tasks between both programs, you can check this example notebook by the author of ipystata
or the ipystata website.
In [ ]: